[PerfXLab] optimize fill performance by bin913 · Pull Request #2216 · flagos-ai/FlagGems

bin913 · 2026-04-02T08:59:37Z

PR Category

[ Operator]

Type of Change

[ Performance Optimization]

Description

optimize fill.fill_scalar_ performance for fill

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

Operator: fill_scalar_  Performance Test (dtype=torch.float16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.653856            0.653792               1.000          [torch.Size([1073741824]), 3.14159]
SUCCESS               0.005008            0.004992               1.003          [torch.Size([64, 64]), 3.14159]
SUCCESS               0.015104            0.015296               0.987          [torch.Size([4096, 4096]), 3.14159]
SUCCESS               0.015328            0.015328               1.000          [torch.Size([64, 512, 512]), 3.14159]
SUCCESS               0.654080            0.654224               1.000          [torch.Size([1024, 1024, 1024]), 3.14159]


Operator: fill_scalar_  Performance Test (dtype=torch.float32, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               1.305952            1.303360               1.002          [torch.Size([1073741824]), 3.14159]
SUCCESS               0.005120            0.005024               1.019          [torch.Size([64, 64]), 3.14159]
SUCCESS               0.025280            0.025376               0.996          [torch.Size([4096, 4096]), 3.14159]
SUCCESS               0.025120            0.025216               0.996          [torch.Size([64, 512, 512]), 3.14159]
SUCCESS               1.306016            1.303264               1.002          [torch.Size([1024, 1024, 1024]), 3.14159]


Operator: fill_scalar_  Performance Test (dtype=torch.bfloat16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.654176            0.654080               1.000          [torch.Size([1073741824]), 3.14159]
SUCCESS               0.004992            0.005152               0.969          [torch.Size([64, 64]), 3.14159]
SUCCESS               0.015232            0.015296               0.996          [torch.Size([4096, 4096]), 3.14159]
SUCCESS               0.015488            0.015328               1.010          [torch.Size([64, 512, 512]), 3.14159]
SUCCESS               0.653888            0.653904               1.000          [torch.Size([1024, 1024, 1024]), 3.14159]

you-and-you · 2026-04-03T08:32:33Z

benchmark/test_tensor_constructor_perf.py

    # tensor constructor with given value
    ("fill_", torch.fill_, fill_input_fn),
+    ("fill_scalar_", torch.ops.aten.fill_.Scalar, fill_input_fn),
+    # ("fill_scalar_", flag_gems.ops.fill.fill_scalar_, fill_input_fn),


Why is the FlagGems benchmark for fill_scalar_ commented out?

Sorry, line 194 shoude be removed. I will removo it.

tengqm · 2026-04-06T02:41:26Z